Goto

Collaborating Authors

 singular vector




What Makes and Breaks Safety Fine tuning A Mechanistic Study

Neural Information Processing Systems

Safety fine-tuning helps align Large Language Models (LLMs) with human preferences for their safe deployment. To better understand the underlying factors that make models safe via safety fine-tuning, we design a synthetic data generation framework that captures salient aspects of an unsafe input by modeling the interaction between the task the model is asked to perform (e.g., "design") versus the specific concepts the task is asked to be performed upon (e.g., a "cycle" vs. a "bomb").



Appendix for " Residual Alignment: Uncovering the Mechanisms of Residual Networks " Anonymous Author(s) Affiliation Address email

Neural Information Processing Systems

We start by providing motivation for the unconstrained Jacobians problem introduced in the main text. We will continue our proof using contradiction. Figure 1: Fully-connected ResNet34 (Type 1 model) trained on MNIST.Figure 2: Fully-connected ResNet34 (Type 1 model) trained on FashionMNIST. Figure 10: Fully-connected ResNet34 (Type 1 model) trained on MNIST. Figure 24: Fully-connected ResNet34 (Type 1 model) trained on MNIST.





Lipschitz regularity of deep neural networks: analysis and efficient estimation

Aladin Virmaux, Kevin Scaman

Neural Information Processing Systems

Deep neural networks are notorious for being sensitive to small well-chosen perturbations, and estimating the regularity of such architectures is of utmost importance for safe and robust practical applications. In this paper, we investigate one of the key characteristics to assess the regularity of such methods: the Lipschitz constant of deep learning architectures.


c0c783b5fc0d7d808f1d14a6e9c8280d-Paper.pdf

Neural Information Processing Systems

A major hurdle in this study is that implicit regularization in deep learning seems to kick in only withcertain types ofdata(notwithrandom dataforexample), andwelackmathematical tools for reasoning about real-life data. Thus one needs a simple test-bed for the investigation, where data admits a crisp mathematical formulation. Following earlier works, we focus on the problem of matrix completion: given a randomly chosen subset of entries from an unknown matrixW, the taskistorecovertheunseen entries. Tocastthisasaprediction problem, wemayvieweach entry inW as a data point: observed entries constitute the training set, and the average reconstruction error over the unobserved entries is the test error,quantifying generalization. Fitting the observed entries is obviously an underdetermined problem with multiple solutions.